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Abstract 

Background: Calorie restriction (CR) is one of the most conserved non-genetic interventions that extends 
healthspan in evolutionarily distant species, ranging from yeast to mammals. The target of rapamycin (TOR) has been 
shown to play a key role in mediating healthspan extension in response to CR by integrating different signals that 
monitor nutrient-availability and orchestrating various components of cellular machinery in response. Both genetic 
and pharmacological interventions that inhibit the TOR pathway exhibit a similar phenotype, which is not further 
amplified by CR. 

Results: In this paper, we present the first comprehensive, computationally derived map of TOR downstream 
effectors, with the objective of discovering key lifespan mediators, their crosstalk, and high-level organization. We 
adopt a systematic approach for tracing information flow from the TOR complex and use it to identify relevant 
signaling elements. By constructing a high-level functional map of TOR downstream effectors, we show that our 
approach is not only capable of recapturing previously known pathways, but also suggests potential targets for future 
studies. 

Information flow scores provide an aggregate ranking of relevance of proteins with respect to the TOR signaling 
pathway. These rankings must be normalized for degree bias, appropriately interpreted, and mapped to associated 
roles in pathways. We propose a novel statistical framework for integrating information flow scores, the set of 
differentially expressed genes in response to rapamycin treatment, and the transcriptional regulatory network. We use 
this framework to identify the most relevant transcription factors in mediating the observed transcriptional response, 
and to construct the effective response network of the TOR pathway. This network is hypothesized to mediate life-span 
extension in response to TOR inhibition. 

Conclusions: Our approach, unlike experimental methods, is not limited to specific aspects of cellular response. 
Rather, it predicts transcriptional changes and post-translational modifications in response to TOR inhibition. The 
constructed effective response network greatly enhances understanding of the mechanisms underlying the aging 
process and helps in identifying new targets for further investigation of anti-aging regimes. It also allows us to identify 
potential network biomarkers for diagnosis and prognosis of age-related pathologies. 

Keywords: Target of rapamycin (TOR), Yeast aging, Interactome, Information flow analysis, Effective response network 



Background 

Cellular aging is a multi-factorial complex phenotype, 
characterized by the accumulation of damaged cellular 
components over the organisms life-span [1]. The pro- 
gression of aging depends on both the increasing rate of 
damage to DNA, RNA, proteins, and cellular organelles, 
as well as the gradual decline of cellular defense mecha- 
nisms against stress. This can ultimately lead to a dysfunc- 
tional cell with a higher risk factor for disease. 
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Limiting caloric intake without causing malnutrition, 
also known as calorie restriction (CR), is one of the 
most conserved non-genetic interventions, which extends 
life-span in evolutionarily distant species ranging from 
yeast to mammals [1-3]. Inhibition of the nutrient-sensing 
pathways, using either genetic or pharmacological inter- 
vention, also results in a similar phenotype [1,2]. More 
importantly, increased lifespan is accompanied by an 
increased healthspan, which delays both the progression 
and the increasing risk-factor for a wide range of age- 
related diseases, including cancers [4-7], cardiovascular 
disease [8-11], and multiple neurodegenerative disorders 
[12-17]. The extent to which these pathologies share 
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their underlying biology is a topic of active investiga- 
tion. Emerging evidence, however, supports the hypoth- 
esis that large classes of age-related diseases are driven 
by similar underlying mechanisms [18]. Understanding 
and controlling these mechanisms, therefore, constitute 
critical aspects of preventing or delaying the onset of 
age-related pathologies. Motivated by these observations, 
considerable effort has been invested in understanding the 
downstream effectors of the nutrient-sensing pathways 
that orchestrate CR-mediated life-span extension. 

The budding yeast, Saccharomyces cerevisiae, has been 
used extensively as a model organism in aging research, 
due to its rapid growth and ease of manipulation [3,19]. 
Having two different aging paradigms - replicative life- 
span (RLS), defined as "the number of buds a mother 
cell can produce before senescence occurs", and chrono- 
logical life-span (CLS), defined as "the duration of viabil- 
ity after entering the stationary-phase", yeast provides a 
unique opportunity for modeling both proliferating and 
post-mitotic cells. Understanding the underlying mecha- 
nisms driving RLS and CLS can ultimately be used to shed 
light on the progression of cancers and neurodegenerative 
diseases, respectively. 

Yeast cells are typically cultured in growth media con- 
taining 2% glucose. Reducing glucose concentration to 
0.5% or less is one of the best characterized CR regi- 
mens in yeast, which increases both CLS and RLS [20-22]. 
The target of rapamycin (TOR) has been shown to play 
a key role in mediating the observed life-span exten- 
sion in response to CR [23]. TOR is a serine/threonine 
protein kinase, which belongs to the conserved family 
of PI3K-related kinases (PIKKs). It was first identified 
using genetic studies in yeast while searching for mutants 
that confer rapamycin-resistance [24] . It was later discov- 
ered that TOR protein kinases, encoded by TORI and 
TOR2 genes in yeast, form two structurally and func- 
tionally distinct multiprotein complexes [25-28]. TOR 
Complex 1 (TORC1) is rapamycin-sensitive and con- 
sists of both TOR proteins, TORI and TOR2, together 
with KOG1, LST8, and TC089. On the other hand, 
TOR Complex 2 (TORC2) does not contain TORI, is 
not inhibited by rapamycin, and contains AVOl, AV02, 
AV03, LST8, BIT2, and BIT61. These two complexes 
correspond to two separate branches of the TOR sig- 
naling network, controlling the spatial and temporal 
aspects of cell growth, respectively, which are conserved 
from yeast to humans [28]. Interestingly, TORC1 also 
has a critical role in aging and age-related patholo- 
gies [29,30]. Many of the known oncoproteins act as 
upstream activators of TORC1, while several tumor 
suppressor proteins inhibit its activity [31,32]. From a 
systems point of view, TORC1 acts as a hub that inte- 
grates various nutrient and stress-related signals and 



regulates a variety of cellular responses [33-35]. Inhibit- 
ing TOR signaling using rapamycin provides a unique 
opportunity to identify its downstream effectors. How- 
ever, these targets may be regulated in different ways, 
including, but not limited to, transcription regulation, 
translational control, and post-translational modifica- 
tions. Capturing various changes that happen during 
rapamycin treatment, in order to create a comprehen- 
sive systems view of the cellular response, is a complex 
task. 

In this paper, we propose a complementary, compu- 
tational approach to reconstruct a comprehensive map 
of TOR downstream effectors. We develop a systematic 
approach to couple random walk techniques with rigorous 
statistical models, integrate different datasets, and iden- 
tify key targets in calorie restriction that are mediated 
by TOR pathway. Using GO enrichment analysis of high 
scoring nodes, we show that information flow analysis 
not only identifies previously known targets of TORC1, 
but also predicts new functional roles for further studies. 
We cross-validate our results with transcriptome profile 
of yeast in response to rapamycin treatment and show 
that our method can accurately predict transcriptional 
changes in response to TORC1 inhibition. Information 
flow scores provide an aggregate ranking of proteins, with 
respect to their relevance to the TOR signaling path- 
way, and are highly susceptible to degree bias. To remedy 
this and to elucidate the roles of underlying signaling 
elements, we propose a novel statistical framework for 
integrating information flow scores, data on regulatory 
relationships, and the expression profile in response to 
rapamycin treatment. 

Using our framework, we identify the most relevant 
transcription factors and construct the effective response 
network of TOR, which is responsible for the observed 
transcriptional changes due to TOR inhibition. Our 
approach, unlike experimental methods, is not limited to 
specific aspects of cellular response. Rather, it predicts 
transcriptional changes, as well as post-translational mod- 
ifications in response to TOR signaling. The resulting 
interaction map greatly enhances our understanding of 
the mechanisms underlying the aging process and helps 
identify novel targets for further investigation of anti- 
aging regimes. It also reveals potential network biomark- 
ers for diagnoses and prognoses of age-related pathologies 
and identifies mechanisms for control of cellular aging 
processes through multi-targeted and combinatorial ther- 
apies [36,37]. 

Results and discussion 

Computing information flow scores from TORC1 

Given the yeast interactome, constructed using the pro- 
cedure detailed in Methods Section and illustrated in 
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Figure 1, we compute information flow scores using ran- 
dom walks initiated at selected nodes in the interactome. 
These nodes comprise members of the TORC1 complex, 
each of which propagates a unit flow (normalized to 0.2 
for each of the five member proteins). We use a dis- 
crete random-walk process in which, at each step, every 
protein aggregates incoming signals and distributes them 
equally among outgoing neighbors. The final information 
flow scores are computed as the steady-state distribution 
of the random-walk process. One of the key parameters 
in the random-walk process, which controls the depth of 
propagation, is called the restart-probability. This is the 
probability that a random walker continues the walk (as 
opposed to teleporting to a node chosen from among a 
set of preferred nodes). In order to give all nodes in the 
interactome a chance of being visited, we use the relation- 
ship between restart probability and the mean depth of 
random-walks by setting parameter a to be equal to 
where d is the diameter of the interactome. For the yeast 
interactome, we determine the diameter to be equal to 6 
and set a = | ~ 0.85, correspondingly (please see the 
Methods section for details of information flow compu- 
tations). Figure 2 illustrates the distribution of computed 



information flow scores, starting from TORC1, as a func- 
tion of node distance from TORC1. It is evident from the 
figure that computed scores are functions of both dis- 
tance from source nodes, as well as multiplicity of paths 
between source and sink nodes. This can be verified from 
the overlapping tails of distributions for nodes at differ- 
ent distances, as well as the varied distribution of scores 
among nodes at the same distance from TORC1. The final 
list of computed information flow scores is available for 
download as Additional file 1. 

Node rankings from the random walk process are sus- 
ceptible to degree-bias, favoring high-degree nodes. To 
remedy this bias and to gain a detailed mechanistic under- 
standing of the roles of various proteins (and associated 
signaling elements), random walk methods need to be 
coupled with appropriate statistical tests. A key contri- 
bution of our work is the development of such a test, 
which yields a fine-grained understanding of key path- 
ways involved in orchestrating cellular response to TOR 
inhibition. To the best of our knowledge, this work rep- 
resents the first application of information flow meth- 
ods for reconstructing the effective response network of 
TORC1. 
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Figure 1 Network integration process. Example of the network integration process around Sch9p. Protein-protein interactions (PPI) and 
post-translational modifications (PTM) were extracted from BioGRID dataset. 
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Figure 2 Distribution of random-walk scores. Information flow versus node distance from TORC1 , showing that random-walk scores are a 
function of distance, as well as multiplicity of paths. 



Constructing a high-level functional map of TOR 
downstream effectors 

TORC1 is not only regulated by the quality and the 
quantity of both carbon and nitrogen sources [34,38-40], 
but also by noxious stressors, such as caffeine [41,42]. 
In response, TORC1 coordinately orchestrates various 
aspects of cellular machinery to mediate cell growth 
[32,40]. This includes autophagy [43], stress response 
[42,44], and protein synthesis (by regulating ribosome 
biogenesis [45], translation initiation [46], and nutrient 
uptake [47,48]). 

In order to systematically identify the functional aspects 
relevant to TOR signaling, we first rank the proteins in 
the yeast interactome based on their information flow 
scores from the TORC1 complex. Given this ranked list, 
we aim to identify functional terms that are highly over- 
represented among top-ranked proteins. Gene Ontology 
(GO) [49] enrichment analysis has been used extensively 
for this purpose. We employ GOrilla [50] to find the 
optimal cut for each GO term, together with its exact min- 
imum hypergeometric (mHG) p-value. Next, we enforce 
a threshold of ^-value < 10 -3 to identify the signifi- 
cant terms. The complete list of enriched terms for each 
branch of GO is available for download as Additional 
files 2. 

GO provides a hierarchical vocabulary to annotate bio- 
logical processes (BP), molecular functions (MF), and 
cellular components (CC). This hierarchical structure, 



represented using a directed acyclic graph (DAG), intro- 
duces an inherent dependency among the significant 
terms identified by GO enrichment analysis. Furthermore, 
seemingly independent terms under different branches of 
GO may be used to annotate the same set of genes. To 
provide a compact, non-redundant representation of the 
significant terms in our experiment, we follow a two-step 
process. First, we extract the subset of enriched terms 
that are marked by the Saccharomyces Genome Database 
(SGD) [51] as GO slim. Yeast GO slim is a compact sub- 
set of the entire GO, selected by SGD curators, which is 
necessary and sufficient to describe different aspects of 
yeast cellular biology. Next, we use EnrichmentMap (EM) 
[52], a recent Cytoscape [53] plugin, to construct the net- 
work (map) of the enriched terms. In this network, unlike 
the original interactome, each node represents a signifi- 
cant GO slim term and each weighted edge indicates the 
extent of overlap between genesets of their correspond- 
ing terms. We use a custom visualization style to illustrate 
various network properties. GO terms under BP, MF, and 
CC branches are color-coded red, green, and blue, respec- 
tively. The p-value of each term determines the opacity 
of both the node and its label; the bolder a term appears, 
the more significant its enrichment score. Finally, the total 
number of enriched genes for each GO term is shown 
using the size of the corresponding node. The final map, 
which is shown in Figure 3, is available for download as 
Additional file 3. This map provides unique opportunities 
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for studying TOR-dependent terms visually, since terms 
(nodes) representing relevant sets of genes tend to cluster 
together in this network. 

First, we note that most of the previously known tar- 
gets of TORC1 are also identified by our information 
flow method, as represented in the enrichment map. For 
example, all terms related to ribosome biogenesis, includ- 
ing relevant cellular components (such as ribosome and 
nucleolus), molecular functions (such as rRNA binding 
and structural constitute of ribosome), and biological pro- 
cesses (such as rRNA processing and ribosomal subunit 
export from nucleus), are clustered in the bottom-left 
corner of the map. These terms, interestingly, are also 
clustered with other terms related to protein synthesis, 
such as regulation of translation, translational initiation, 
and cytoplasmic translation. Furthermore, many of the 
terms related to stress-response, such as response to DNA 
damage stimulus and DNA repair, are clustered in the 



bottom-left corner of the map. Finally, many of the terms 
related to TOR signaling, nutrient uptake, and cytoskel- 
eton organization are grouped on the top section of 
the map. 

Additionally, we observe that there are terms in this 
map that have not been adequately investigated in previ- 
ous efforts. For example, even though translational control 
is a well-known function of TORC1, transcriptional con- 
trol is less-studied. Several terms related to transcription 
initiation and elongation are enriched in our analysis, as 
shown on the bottom-right of the map. In order to gain a 
mechanistic understanding of these terms, we project the 
geneset of each term (node) back to the original network 
and construct the corresponding induced subgraph in the 
yeast interactome. As a case study, we extract the set of 
enriched genes represented by the transcription initiation 
GO term and construct its induced subgraph, which is 
shown in Figure 4. Here, nodes, representing proteins, are 
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Figure 3 Enrichment map of yeast GO slim terms. Enriched terms are identified by mHG p-value, computed for the ranked-list of genes based on 
their information flow scores. Each node represents a significant GO term and edges represent the overlap between genesets of GO terms. Terms in 
different branches of GO are color-coded with red, green, and blue. Color intensity of each node represents the significance of its p-value, while the 
node size illustrates the size of its geneset. Thickness of edges is related to the extent of overlap among genesets. 
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grouped and annotated based on their functional role in 
forming the transcription pre-initiation complex (PIC), as 
well as the RNA polymerase (RNAP). The basal level of 
transcription in Eukaryotic cells by RNAP needs a family 
of general transcription factors (GTF), prior to the forma- 
tion of PIC. The TATA-binding protein (TBP), encoded by 
the Sptl5 gene in yeast, is a universal GTF that is involved 
in transcription by all three types of nuclear RNAP. As a 
component of TFIIIB complex, it forms the PIC complex 
and recruits RNAPIII to the transcriptional start site(TSS) 
of tRNAs, 5S rRNA, and most snRNAs. As a part of 
TFIID, it forms a complex together with TBP-associated 
factors (TAF) and binds to the core promoter region of 
the protein-coding genes, as well as some snRNAs. The 
correct assembly of PIC, required for directing RNAPII 
to the TSS, needs additional GTFs, namely TFIIA, -B, -D, 
-E, -F, and TFIIH, as well as the Mediator (MED) complex. 
These components are assembled in an orderly fashion 
to form the PIC and mediate the transcription initiation 
by RNAPII (please see Hampsey [54] and Maston et al. 
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Figure 4 TOR-dependent control of transcription initiation. 

Induced subgraph in the yeast interactome, constructed from the 
top-ranked genes in the information flow analysis that are annotated 
with the transcription initiation GO term. Different functional subunits 
are marked and color-coded appropriately. 



[55] for a review). These complex interactions are faith- 
fully reconstructed in Figure 4, which provides a more 
refined understanding of transcription initiation, under 
TOR control, in the yeast cells. 

Comparison of predicted targets to the set of differentially 
expressed genes in response to Rapamycin treatment 

Rapamycin, a lipophilic macrolide originally purified as an 
antifungal agent and then re-discovered as an immuno- 
suppressive drug, forms a toxic complex with its intra- 
cellular receptor FKBP12, encoded by the Fprl gene in 
yeast, and directly binds to TOR in order to perform 
its inhibitory action [32]. We hypothesize that if the 
information flow-based method agrees with the TORC1 
signaling network, it should be able to predict transcrip- 
tional changes due to rapamycin treatment, which inhibits 
TORC1 in vivo. To validate this hypothesis, we used a 
recent mRNA expression profile of yeast in response to 
rapamycin treatment [56]. We extracted the set of dif- 
ferentially expressed genes, at a minimum threshold of 
2-fold change, and constructed a vector of true positives 
from this set by filtering out genes that do not have a 
corresponding vertex in the yeast interactome. The final 
dataset includes 342 repressed and 237 induced genes in 
our experiment. 

Using this set of true-positives, we computed the enrich- 
ment plot of information flow scores by ranking all 
proteins and computing the hypergeometric score as 
a function of the protein rank, which is illustrated in 
Figure 5. The peak of the plot, corresponding to the min- 
imum hypergeometric (mHG) score, occurs at the index 
/ = 906 from the top, which covers approximately the 
top 15% of scores. There are 181 positive genes in this 
partition, from a total of 579 positives, yielding a mHG 
score of 1.11 * 10 -22 . We computed the exact p-value 
corresponding to this mHG score, using the dynamic- 
programming method of Eden et al. [57], resulting in the 
significant enrichment p-value of 3.25 * 10 -19 . This in 
turn supports our hypothesis that the random-walk neigh- 
borhood of TORC1 is highly enriched with the set of genes 
that are differentially expressed in response to rapamycin 
treatment. 

Post-translational modifications among top-ranked 
proteins: a case study on Gapl regulation 

An interesting observation from Figure 5 is that the 
highest-ranked genes (approximately the top 150 genes), 
marked with a red box, are not enriched in terms 
of rapamycin-induced genes. This can be explained by 
the fact that regulatory elements in the TOR signal- 
ing pathway, including TFs, do not typically change 
their expression level in response to TOR signaling. 
Instead, they are targeted for post-translational modi- 
fications (typically, phosphorylation). We consequently 
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Figure 5 Enrichment plot for rapamycin-treatment dataset. Enrichment score as a function of the score percentage. Computations are based 
on the set of differentially expressed genes in response to rapamycin treatment. The peak of plot occurs at around top 15% of scores, with the 
corresponding exact p-value of 3.3*10-19. 



hypothesize that the top genes should also be enriched 
in terms of phosphorylation events. To further inves- 
tigate this hypothesis, we focus on a case study of 
Gapl regulation, a general amino acid permease reg- 
ulated by NCR. We choose Gapl since its regula- 
tory pathway, originating from TORC1, is well-studied 
in literature. Moreover, data from phosphoproteomic 
experiments, which measures phosphorylation events 
among elements of this pathway, is readily available. 
Specifically, Gapl is positively regulated via Gln3 and 
Gatl, while it is repressed by Gzf3 and Dal80 [34,40]. 
Interestingly, all four of these regulators are among 
top-ranked transcription factors, yet none of them are dif- 
ferentially expressed in response to rapamycin treatment. 
Using a recent phosphoproteome of yeast in response 
to rapamycin treatment [58], we validated that both 
of the transcriptional activators of Gapl, namely Gln3 
and Gatl, are highly phosphorylated in response to 
rapamycin treatment. Moreover, Tap42-Sit4, which is the 
upstream regulator of Gcn4, is indirectly regulated by 
TORC1. 

Figure 6 illustrates this signaling pathway, with each 
element annotated using its information flow rank. All sig- 
naling elements upstream of Gapl are present among top- 
ranked scores, yet none of them change their expression 
levels in response to rapamycin treatment. This partially 
supports our hypothesis that the top-ranked genes in the 
random-walk are primarily targets of post-translational 



modifications. However, a more thorough experimental 
analysis of the the top-ranked proteins potentially may 
reveal currently unknown mechanisms by which yeast 
cells respond to TOR signaling. To this end, our com- 
putational studies motivate and provide data for future 
experimental investigations. 

Sensitivity and specificity of information flow scores in 
predicting key transcription factors 

Top-ranked proteins in information flow analysis are 
highly enriched in terms of differentially expressed genes 
under rapamycin treatment. However, TORC1 does not 
directly regulate expression of these genes. This observa- 
tion raises the question: which transcription factors are 
responsible and which intermediary elements are involved 
in these regulations? We answer the first question here, 
while deferring the latter to subsequent sections. 

To find the key transcription factors that modulate 
the observed transcriptional response, we use two sepa- 
rate statistical predictors, one based on the information 
flow scores and the other based on the set of differen- 
tially expressed genes. These predictors allow us to assess 
the significance of TFs with respect to their computa- 
tionally computed, top-ranked and experimentally vali- 
dated targets, respectively. In the first method, we call 
a transcription factor relevant if a significant fraction of 
its target genes are highly-ranked in information flow 
method. Conversely, in the second method we define 
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Figure 6 TORC1 -dependent regulation of Gapl . The schematic 
diagram is based on literature evidence for the known interactions. 
Each node in the signaling pathway is annotated with the rank of its 
information flow score from TORC1 and colored with its functional 
classification. Yellow nodes represent kinase associated proteins, red 
nodes are transcription factors, and blue node (Sit4) is a phosphatase. 
The rest of nodes have a default color of grey. Ranking of nodes 
based on their information flow scores coincides with our prior 
knowledge on the structure of this pathway. Top/bottom ranked 
nodes are discriminated using the computed cutoff value (/) based on 
differentially expressed genes. The "?" indicates an unknown 
underlying mechanism, yet to be discovered, that connects TORC1 to 
the rest of transcription factors. 



the relevance in terms of the portion of its differentially 
expressed targets (please see Equations 6-8 for details). 

We use j?-value(X = kj) and j??-value(y = hp) and 
apply a cutoff value of e = 0.01 to identify significant p- 
values computed for computational and experimental pre- 
dictions, respectively. At this threshold, we compute the 
sensitivity and specificity of information flow methods as 
0.2245 and 0.9846, respectively. The observed high speci- 
ficity value suggests that if targets of a given TF are not 



differentially expressed, with high probability, our compu- 
tational model also reports it as a negative (it will not have 
significant number of top-ranked targets). In other words, 
transcription factors that are identified as significant using 
information flow scores are highly precise. On the other 
hand, the lower sensitivity score implies that even if a TF 
has many differentially expressed targets, our computa- 
tional method may miss it. From this, we can conclude 
that transcription factors that have significant numbers 
of top-ranked targets are high-confidence candidate(s) as 
downstream effectors of TORC1. However, there are cases 
where we may miss relevant transcription factors with 
a significant number of differentially expressed genes by 
this approach. In the next section, we propose a statisti- 
cal framework to integrate information flow scores and 
expression profiles to reliably identify the most relevant 
subset of transcription factors that are involved in medi- 
ating the transcriptional response to TOR inhibition, and 
consequently construct the effective response network of 
TORC1. 

Identifying the most relevant transcription factors 

We now seek to integrate experimental measurements 
from rapamycin treatment, information flow scores, and 
the transcription regulatory network into a unified frame- 
work to identify the most relevant transcription factors. 
To this end, we introduce the notion of relevance score. 
Let random variable Z denote the number of top-ranked 
positive targets, and I<tp denote the number of top-ranked 
positive targets of a given TF. We define the relevance 
score as — log(/?- value (Z = 1<tp))^ The relevance score 
assesses both positivity and rank of the targets for a 
given TF (please see Equation 10 for details). Using this 
approach, we identify 17 TFs with high relevance scores, 
which are hypothesized to be responsible for the tran- 
scriptional changes in a TORC1 -dependent manner. The 
complete list of computed statistics for all transcription 
factors is summarized in Additional file 4. 

The top five transcription factors are listed in Table 1. 
Among these top-ranked, high confidence, transcrip- 
tion factors, Sfpl, Gln3, and Gcn4 are well-documented 
downstream effectors of TORC1 [48,59-61] (please see 
Zaman et al. [34], Smets et al. [40], and Loewith and Hall 
[32] for a more comprehensive review). Sfpl is a stress- 
and nutrient-sensitive regulator of cell growth, respon- 
sible for mediating the expression of genes involved in 
ribosome biogenesis, such as RP genes and RiBi factors 
[62,63]. TORC1 mediates Sfpl-related genes by phos- 
phorylating Sfpl and regulating its nuclear localization 
[59]. Gln3, a GATA-family transcription factor, positively 
regulates the expression of nitrogen catabolite repres- 
sion (NCR) -sensitive genes [60,64]. TORCl-dependent 
regulation of Gln3 is mediated by promoting its associ- 
ation with its cytoplasmic anchor protein Ure2 [32,65]. 
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Table 1 Top-ranked transcription factors with high 
confidence scores 



TFORF 


TF name 


TF rank 


TF confidence 


YLR403W 


SFP1 


22 


43.5048 


YER040W 


GLN3 


148 


57.7734 


YML007W 


YAP1 


618 


24.3672 


YEL009C 


GCN4 


638 


4.822 


YHR084W 


STE12 


825 


2.9668 



Gcn4 is a nutrient-responsive transcription factor, which 
is activated upon amino acid starvation [66]. TORC1 
regulates Gcn4 by mediating its translation level in a 
eIF2a -dependent manner [32]. Interestingly, Steffen et al. 
[61] also proposed a critical role for Gcn4 in mediating 
life-span in yeast. 

However, to the best of our knowledge, Stel2 and 
Yapl have not been previously positioned downstream 
of TORC1. Stel2 is best known as a downstream tar- 
get of mitogen-activated protein kinase (MAPK) sig- 
naling cascade and is responsible for regulating genes 
involved in mating or pseudohyphal/invasive growth [67]. 
Rutherford et al. [68] show that over-expression of the 
ammonium permease Mep2 induces the transcription of 
known targets of Stel2. A more recent study by Santos 
et al. [69] additionally positions TORC1 downstream of 
Mep2, which, taken together with the link between Mep2- 
Stel2, suggests Stel2 as a potential downstream effector 
of TORC1. Yapl is an AP-1 family transcription factor 
required for inducing oxidative [70,71] and carbon [72] 
stress responses, the latter is proposed to be independent 
of TORC1. Additionally, Yapl expression has been shown 
to increase significantly during replicative aging [73]. It 
has been suggested that spermidine, a conserved longevity 
factor [74], mediates macroautophagy in a Yapl and Gcn4 
dependent manner [75]. Finally, there is a diverse set of 
age-related functions associated with Yapl, many of which 
are also attributed to TORC1. These observations sug- 
gest Yapl as a potential candidate downstream effector of 
TORC1. 

Constructing the effective response network of TORC1 

To uncover the regulatory mechanisms that mediate the 
response to TOR inhibition, we construct the effective 
response network (ERN) of TORC1, which is illustrated 
in Figure 7 and is available for download as Additional 
file 5. Node attributes for this network are available for 
download separately as Additional file 6. This network 
consists of the most relevant TFs, together with their top- 
ranked positive targets, with a total of 1,288 regulatory 
interactions between 17 transcription factors and 181 
target genes. 



In order to better understand the functional roles of 
the predicted targets, we use FIDEA [76] to identify 
enriched GO terms under the biological process (BP) 
branch. Figure 8 illustrates the static word cloud of 
the enriched terms, as generated by FIDEA, the com- 
plete list of which is available for download as Addi- 
tional file 7. Unlike the enrichment map of TORC1, 
which spans a variety of different functions, targets in 
the effective response network (ERN) are almost exclu- 
sively involved in ribosome biogenesis and the cellular 
translation process. Ribosome biogenesis is one of the 
most energy-consuming tasks in the cell that is directly 
linked to the rate of translation and is required for cell 
growth [77]. Calorie restriction, or alternatively inhibiting 
TORC1 by Rapamycin treatment, is known to coordi- 
nately regulate this process via a complex set of path- 
ways involving transcription factors Ifhl, Sfpl, Fhll, and 
Rapl [77]. Interestingly, all four of these transcription 
factors are identified by our method among the top 6 
TFs with the highest relevance scores (together with 
Gcn3 and Met4). The effective response network pro- 
vides a refined view of how yeast cells re-wire various 
aspects of ribosome biogenesis in order to modulate cell 
growth. This network can be used to gain a detailed 
understanding of the regulatory mechanisms that are 
responsible for TOR-dependent transcriptional changes 
in yeast. 

Conclusions 

Understanding various processes associated with aging 
has important implications for the diagnosis, prog- 
nosis, and treatment of age-related pathologies. Cur- 
rent methods for constructing aging pathways rely on 
detailed experiments that study cellular response to care- 
fully controlled signals. This process is expensive, time- 
consuming, and typically restricted to specific aspects of 
cellular response. In this study, we presented a comple- 
mentary, computational approach that aims to construct 
detailed aging pathways using the yeast interactome by 
initiating random walks at proteins that are key play- 
ers in the aging process (the target of rapamycin or 
TOR, in this study). At the heart of our method is a 
rigorous statistical and computational framework that 
identifies significant effector proteins and provides infor- 
mation about the specific mechanisms associated with 
them. 

We present comprehensive validation of our computa- 
tional results through GO enrichment studies and man- 
ual curation to show that our method identifies most 
of the known proteins downstream from TOR, while 
identifying several new proteins for future experimental 
investigations. Additionally, we showed that information 
flow scores faithfully predict transcriptional changes in 
response to rapamycin-treatment, which validates accu- 
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Figure 7 Effective response network (ERN) of TORC1. The effective response network is computed for most relevant transcription factors, 
yielding a network of 1 ,288 transcription regulations between 1 7 TFs and 181 target genes. Green nodes represent TFs while blue nodes are the 
target genes. The size and and color intensity of TFs and target genes represent their relevance score and information flow score, respectively. 



racy of predicted effectors. Furthermore, we show that 
the predicted targets are also enriched with proteins that 
are post-translationally modified (i.e., phosphorylated) 
in response to TOR inhibition. Finally, we constructed 
the effective response network of the TOR pathway. 
This network is hypothesized to mediate transcriptional 
changes in response to TOR inhibition. A direct out- 
come of our study is a complete dataset of proteins down- 



stream of TOR, their interactions, functional roles, and 
organization. 

Methods 

Constructing yeast interactome 

We obtained the yeast protein-protein interactions (PPI) 
from the BioGRID [78] database, update 2011 [79], ver- 
sion 3.1.83, by extracting all physical interactions, except 
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Figure 8 Enrichment analysis of the ERN. Static word cloud for the enriched BP terms in the effective response network (ERN). 



for protein-RNA interactions, and excluding interspecies 
and self interactions. This dataset consists of 103,619 
(63,395 non-redundant) physical interactions among 
5,691 proteins, and is available for download as Additional 
file 8. We then identified the subset of interactions associ- 
ated with post-translational modification (PTM), marked 
with the "biochemical activity" evidence code in BioGRID, 
resulting in 5,791 (5,443 non-redundant) biochemical 
activities among proteins in yeast. These are available for 
download as Additional file 9. Each of these interactions 
represents a directional enzymatic activity, where the bait 
protein executes the activity on the substrate hit protein. 
After integrating different modifications among similar 
pairs of proteins, we obtained 5,421 directional edges 
among 2,002 proteins in the yeast interactome. The bulk 
of these interactions (over 4,000) are the phosphorylation 
events identified by Ptacek et al. [80] using proteome chip 
technology. 

We constructed the integrated network of yeast inter- 
actions, the yeast interactome, by integrating protein- 
protein interactions(PPIs) and post-translational modifi- 
cations(PTMs). For pairs of proteins that have both PPI 
and PTM, we give higher priority to PTM, since it pro- 
vides a more refined description of the type of interaction. 



Figure 1 illustrates an example of the integration process 
around the Sch9 protein, which is a well-documented sub- 
strate of TORC1. The final constructed interactome is 
available for download as Additional file 10. This network 
consists of 5,287 uni-directional and 58,108 bi-directional 
edges (58,041 PPIs and 134 bi-directional PTMs) among 
5,691 nodes. The node attributes and alternative labels for 
each node in the yeast interactome are also available for 
download as Additional file 11. 

Transcriptional regulatory network (TRN) of yeast 

We constructed the yeast transcriptional regulatory net- 
work (TRN) from the documented regulations in YEAS- 
TRACT [81], consisting of 48,082 interactions between 
183 transcription factors (TF) and 6,403 target genes 
(TG). Among these 183 TFs, 179 of them have a corre- 
sponding node in the yeast interactome. 

Tracing information flow in the interactome 

We use a computational approach, based on a discrete- 
time random walk process, to track directional informa- 
tion flow in the interactome. Similar formulations have 
been previously used to prioritize candidate disease genes 
[82,83], discover network bio-markers for cancer [84], and 
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identify protein complexes [85,86]. Additionally, there is 
a known correspondence between random-walk methods 
on undirected graphs and formulations based on circuit 
network models [87]. Our formulation takes into account 
both network distances, as well as multiplicity of paths 
between pairs of proteins. It also benefits from using 
edge directions (when available) to discriminate between 
upstream regulators and downstream effectors. 

Let G = (V,E) be a mixed graph, having both directed 
and undirected edges. Each node in V corresponds to a 
protein and edge (u, v) e E iff protein u interacts with 
protein v in the integrated network. Graph G can be repre- 
sented using its adjacency matrix A, where Ay = 1, if node 
i has a directed edge to node and is 0 otherwise. Undi- 
rected edges are replaced by a pair of directed edges in 
each direction. A random walk on G, initiated from vertex 
v, is defined as a sequence of transitions among vertices, 
starting from v. At each step, the random walker randomly 
chooses the next vertex from among the neighbors of the 
current node. The sequence of visited vertices generated 
by this random process is a Markov chain, since the choice 
of next vertex depends only on the current node. We can 
represent the transition matrix of this Markov process as 
a column-stochastic matrix, P, where pij = Pr(S t +i = 
Vj\S t = Vj), and random variable S t represents the state of 
the random walk at the time step t 

Random walk with restart (RWR) is a modified Markov 
chain in which, at each step, a random walker has the 
choice of either continuing along its path, with probability 
a, or jump (teleport) back to the initial vertex, with prob- 
ability 1 — a. Given the transition matrix of the original 
random walk process, P, the transition matrix of the mod- 
ified chain, M, can be computed asM = aP+(l — a)e v l T , 
where e v is a stochastic vector of size n having zeros 
everywhere, except at index v, and 1 is a vector of all 
ones. The stationary distribution of the modified chain, 
7i v (a), defines the portion of time spent on each node 
in an infinite random walk with restart initiated at node 
v, with parameter a. This stationary distribution can be 
computed as follows: 



jr v (a) = Mn v (a) 



= (aP+(l-a)e v V)jr v (a) 



(1) 



Enforcing a unit norm on the dominant eigenvector to 
ensure its stochastic property, || n v (ot) \\\= l T n v = 1, we 
will have the following iterative form: 



Jt v (a) = aPjr v (a) + (1 - a)e v 



(2) 



which is a special case of the personalized PageRank 
[88-91], with preference vector e v . Alternatively, we can 



compute tt v directly by solving the following linear 
system: 



jr v ( a ) = (1 -a)(I- aP)~ l e v 



(3) 



where the right-multiplication with e v simply selects col- 
umn v of the matrix Q. The factor 1 — a can be viewed 
as the decay factor of the signal; the higher the parameter 
a, the further the signal can propagate. Let us denote by 
random variable R the number of hops taken by random 
walker before it jumps back to source node v. Then, R fol- 
lows a geometric distribution with probability of success 
(1 — a) and the expected (mean) length of paths taken by 
random walker can be computed as E(R) = In other 

words, if we let a = for a given value of d, we expect 
the average length of paths taken by such a random walk to 
be equivalent to d, thus we call d the depth of the random 
walk. 

Cross-validating information flow scores with the set of 
differentially expressed genes in response to TOR 
inhibition 

Given the list of gene products ranked by their informa- 
tion flow scores, we want to assess the enrichment of 
differentially expressed genes, in response to rapamycin 
treatment, among top-ranked proteins. 

The classical approach to this problem is to select a pre- 
defined cutoff on ranks, denoted by /, which separates the 
top-ranked genes (target set) from the rest (background 
set), and then compute the enrichment p-value using 
the hypergeometric distribution. Let us denote the total 
number of gene products by N and the total number of 
differentially expressed genes (true positives) by A, Using 
a similar notation as Eden et al. [57], we encode these 
annotations using a binary vector, X = "k\, X2, . . . A/y e 
{0, 1}^, having exactly A ones and N — A zeros. Let the 
random variable T denote the number of positive genes 
in the target set, if we distribute genes randomly. In this 
formulation, the hypergeometric p-value is defined as: 

^-value(r = bi(X)) = Prob(bi(X) < T) 

= HGT(b l (k)\N,A,l) 

min(A,l) 

= E 

t=b t {X) 



C(A t t)C(N -AJ-t) 



C(N, I) 



(4) 



where HGT is the tail of hypergeometric distribution, and 
b[(X) = Yl\=i ls tne number of observed positives in 
the target set. The drawback of this approach is that we 
need a predefined cutoff value, /. To remedy this, Eden 
et al. [57] propose a two-step method for computing the 
exact enrichment p-value, called mHG p-value, without 
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the need for a predefined cutoff value of /. In the first 
step of this process, we identify an optimal cut, over all 
possible cuts, which minimizes the hypergeometric score. 
The value computed in this manner is called the minimum 
hypergeometric (mHG) score, and is defined as: 



mHG(X) = min^MHGTibiiXWtAJ) 



(5) 



Next, we use a dynamic programming (DP) method to 
compute the exact p-value of the observed mHG score, in 
the state space of all possible X vectors with size N hav- 
ing exactly A ones (please refer to Eden et al. [57] for 
algorithmic details, and Eden [92] for an efficient imple- 
mentation). We adopt this strategy to cross- validate our 
results with the transcriptome profile of yeast cells in 
response to rapamycin treatment. We subsequently define 
the enrichment plot, which illustrates the absolute value of 
the logarithm of the HG score as a function of cutoff per- 
centage. The minimum hypergeometric (mHG) score can 
be viewed as the peak of this plot, and the correspond- 
ing exact p-value can be computed for this peak using the 
aforementioned DP algorithm. 

Assessing the sensitivity and the specificity of information 
flow scores 

Given an optimal cutoff length / (computed for differ- 
entially expressed genes in response to TOR inhibition), 
which partitions nodes into top/bottom ranked proteins, 
together with a transcription factor (TF) of interest, p^ 
we are interested in assessing the importance of pi in 
mediating the observed transcriptional response. In other 
words, given that pi has a significant number of top- 
ranked targets, how confident are we that it will also have 
a significant number of differentially expressed targets? 
Conversely, if pt has many differentially expressed targets, 
how likely is it to see its targets among top-ranked genes? 

Let us denote the total number of targets of TF pi by 
/<, and the number of its positive (differentially expressed) 
and top-ranked (in information flow) targets by kp and 
I<t, respectively. Let the random variable X be the number 
of top-ranked targets, if we were uniformly distributing k 
targets of pi among all genes in the yeast interactome. Sim- 
ilarly, let Y be the number of positive targets of pu if we 
distribute positive targets uniformly. Then, we can com- 
pute the following p-values for top-ranked and positive 
targets, respectively: 



^-value(X = k T ) = Prob(k T < X) 
= HGT(k T \N, Ik) 

min{l,k) 

= E 

X=I<t 



C(l,x)C(N-l,k-x) 
C(N,k) 



(6) 



p-vahie(Y = k P ) = Prob(k P < Y) 

= HGT(kp\N,A, k) 

min{A,k) 

= E 

y=k P 



C(A,y)C(N - A,k - y) 
CAN.k) 



(7) 



After correcting for multiple hypothesis testing using 
Bonferroni method, we use a given threshold value of e 
and define the sensitivity and specificity for the entire set 
of transcription factors as: 



sensitivity = Prob (p-value (X = kj) 

< e|^-value(r = kp) < e) 
specificity = Prob(e < p-vdlue(X = /rr)|e 

< p-vdlue(Y = kp)) 



(8) 



The motivation behind our approach is that the set of 
transcription factors with a significant number of differ- 
entially expressed targets provides us with an experimen- 
tally validated set of key factors, whereas transcription 
factors that have a significant number of top-ranked tar- 
gets act as computational predictions for identifying the 
most relevant TFs. Let TP be the number of identified 
true positives, P be the total number of positives, and 
FN be the number of false negatives. The sensitivity of a 
method, defined as ^ = T] ^ FN > measures the fraction 
of positive instances (transcription factors having a signif- 
icant number of differentially expressed targets) that are 
also predicted using the information flow method (com- 
putational predictions). Conversely, let TN be the number 
of true negatives identified by the method and N be the 
total number of negatives. Specificity, formally defined as 
^ = T ™ Fp , corresponds to the fraction of irrelevant 
TFs, computed based on the experimental dataset, that are 
also identified as irrelevant by our computational predic- 
tions. These two measures are closely related to type I and 
II errors as follows: 



Type I error (a) = False-positive-rate (FPR) 

= 1 — specificity 
Type II error(/3) = False-negative-rate (FNR) 

= 1 — sensitivity 



(9) 



Integrating computational predictions with experimental 
datasets 

Given the set of differentially expressed genes in response 
to rapamycin treatment, the computed information flow 
scores, and the transcriptional regulatory network (TRN) 
of yeast, we aim to construct an integrative statisti- 
cal framework to identify the most relevant transcrip- 
tion factors with respect to mediating the transcriptional 
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response to TOR inhibition, and decipher the underlying 
effective response network. 

Let us denote the number of top-ranked positive tar- 
gets of a given TF by kjp. If we compute the probability of 
observing kjp or more positive targets among top-ranked 
genes, entirely by chance, we can subsequently identify the 
associated subset of relevant transcription factors. Let the 
random variable Z denote the number of top-ranked posi- 
tive targets, if we were randomly distributing all targets for 
the given TF. We can compute the p-value of Z by condi- 
tioning it on the number of top-ranked targets as follows: 

p- value (Z = kjp) = Prob(I<TP < Z) 

min (l,k) 

= p rob(k TP < Z\X = x) 

X=I<TP 

x ProbiX = x) 

min{l,k) min(x,bi(X)) 

= Prob{Z = z\X = x) 

X—/<TP Z—kpp 

x ProbiX = x) 

min{l,k) 

= HG(x\N,l,k) 

X=I<TP 

min(x,bi(k)) 

x J2 HG(z\l,bi(X),x) 

Z=kpp 

(10) 

After correcting for multiple hypothesis testing using 
Bonferroni method, we define the relevance score of each 
TF as — log 10 (p-value (Z = I<tp))> and construct the 
effective response network of TORC1 using the most rel- 
evant TFs, together with their top-ranked positive targets, 
correspondingly. 

Availability of supporting data 

The data sets supporting the results of this arti- 
cle are included within the article (and its additional 
files). They are also available for download from http:// 
compbio.soihub.org/projects/torcl. All necessary codes 
and datasets to reproduce the results in this paper are 
bundled as Additional file 12. 

Additional files 



Additional file 1 : Information flow scores. Excel table file (*.xls) 
illustrating the computed information flow scores for different proteins, 
sorted based on their proximity to T0RC1 . 

Additional file 2: GO Enrichment of information flow scores. Excel 
table files (*.zip) illustrating the GO enrichment analysis using mHG under 
BP, CC, and MF branches, respectively. 



Additional file 3: Enrichment map of the significant GO terms. 

Cytoscape file (*.zip) containing the final enrichment map. Each node 
represents a GO term and the corresponding geneset and enrichment 
p-value are encoded in its attributes. Edges are computed based on the 
geneset overlap among GO terms (please see www.cytoscape.org for 
more information about loading the file). 

Additional file 4: Statistical analysis of the most relevant 
transcription factors (TF). Excel table file (*.xls) illustrating the computed 
statistics for different transcription factors, sorted based on their proximity 
toTORG. 

Additional file 5: Effective response network (ERN) of TORCl.Tab 

separated file (*.txt) representing the edge list of the effective response 
network of T0RC1 (corresponding nodes are marked by their systematic 
name). 

Additional file 6: Node attributes for the effective response network. 

Tab-separated table file (*.txt) file containing additional attributes for the 
nodes in the ERN. 

Additional file 7: Functional enrichment of the targets in the 
effective response network. Tab-separated table file (*.txt) file containing 
the enriched BP terms in the ERN, identified by FIDEA. 

Additional file 8: Yeast protein-protein interactions. Tab-separated 
table file (*.txt) containing the non-redundant PPI edge list (corresponding 
nodes are represented using their Entrez Gene ID). 

Additional file 9: Yeast biochemical activities. Tab-separated table file 
(*.txt) containing the list of non-redundant PTM interactions 
(corresponding nodes are represented using their Entrez Gene ID). 

Additional file 10: Yeast interactome. Simple interaction file (*.txt) 
containing the integrated network of the yeast interactome 
(corresponding nodes are represented using their Entrez Gene ID). 
Protein-protein interactions (marked with pp) are undirected while the rest 
of edges are directed. 

Additional file 1 1 : Node attributes for the yeast interactome. 

Tab-separated table file (*.txt) file containing various ID(s) for each node in 
the yeast interactome. 

Additional file 1 2: Code/dataset bundle. Compressed ZIP file (*.zip) 
containing all codes and datasets used in this experiment. 
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